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Abstract 

A  number  of  issues  arise  when  trying  to  scale- 
up  natural  language  understanding  (NLU)  tools 
designed  for  relatively  simple  domains  (e.g., 
flight  information)  to  domains  such  as  medical 
advising  or  tutoring  where  deep  understanding 
of  user  utterances  is  necessary.  Because  the 
subject  matter  is  richer,  the  range  of  vocabu¬ 
lary  and  grammatical  structures  is  larger  mean¬ 
ing  NLU  tools  are  more  likely  to  encounter 
out-of-vocabulary  words  or  extra-grammatical 
utterances.  This  is  especially  true  in  med¬ 
ical  advising  and  tutoring  where  users  may 
not  know  the  correct  vocabulary  and  use  com¬ 
mon  sense  terms  or  descriptions  instead.  Tech¬ 
niques  designed  to  improve  robustness  (e.g., 
skipping  unknown  words,  relaxing  grammat¬ 
ical  constraints,  mapping  unknown  words  to 
known  words)  are  effective  at  increasing  the 
number  of  utterances  for  which  an  NLU  sub¬ 
system  can  produce  a  semantic  interpretation. 
However,  such  techniques  introduce  additional 
ambiguity  and  can  lead  to  a  loss  of  fidelity  (i.e., 
a  mismatch  between  the  semantic  interpreta¬ 
tion  and  what  the  language  producer  meant). 

To  control  this  trade-off,  we  propose  semantic 
interpretation  confidence  scores  akin  to  speech 
recognition  confidence  scores,  and  describe  our 
initial  attempt  to  compute  such  a  score  in  a 
modularized  NLU  sub-system. 

1  Introduction 

Applications  such  as  automated  medical  advice  and  tu¬ 
toring  use  rich  knowledge  representations,  and  natural 
language  input  to  dialogue  systems  in  these  domains 
contains  a  wide  range  of  vocabulary  and  grammatical 
structures.  Natural  language  understanding  (NLU)  tools 


may  fail  when  encountering  out-of-vocabulary  words  or 
extra-grammatical  utterances.  Using  robustness  features 
such  as  skipping  unknown  words  and  mapping  unknown 
words  to  known  words  can  allow  an  NLU  sub-system  to 
recover  fully  from  failure,  or  at  least  to  extract  pieces  of 
meaning  that  can  be  used  for  a  directed  clarification  ques¬ 
tion  (see  (Gabsdil,  2003;  Rose,  1997)  for  more  details). 
Such  robustness  is  especially  critical  for  the  domain  of 
tutoring.  Educators  are  interested  in  using  dialogue  to 
address  conceptual  errors  instead  of  focusing  on  termi¬ 
nology  errors,  and  students  want  “partial  credit”  from  tu¬ 
tors  if  they  have  the  right  general  idea.  Thus,  NLU  sub¬ 
systems  for  tutoring  must  attempt  to  extract  correct  ele¬ 
ments  from  student  answers  that  they  do  not  fully  under¬ 
stand. 

The  problem  with  such  robustness  features  is  that  they 
introduce  additional  ambiguity  and  can  lead  to  a  loss  of 
fidelity  (i.e.,  a  mismatch  between  the  semantic  interpre¬ 
tation  and  what  the  language  producer  meant)  if  no  clar¬ 
ification  or  confirmation  request  is  made.  We  argue  that 
semantic  interpretation  confidence  scores  (akin  to  speech 
recognition  confidence  scores)  are  necessary  to  properly 
manage  this  robustness  versus  fidelity  trade-off.  It  is  not 
enough  to  ask  for  clarification  of  missing  information; 
the  dialogue  system  needs  confidence  scores  so  it  can  de¬ 
cide  what  information  present  in  the  logical  form  (if  any) 
should  be  clarified  or  confirmed. 

In  addition  to  avoiding  misunderstandings,  dialogue 
systems  should  be  sensitive  to  the  lexical  and  syntactic 
choices  made  by  users.  Pickering  and  Garrod  (under  re¬ 
vision)  argue  that  conversational  agents  track  each  other’s 
use  of  referring  expressions.  In  experimental  contexts 
such  as  games  involving  mazes,  participants  often  im¬ 
plicitly  agree  on  a  conventional  way  of  referring  to  en¬ 
tities  such  as  places  in  the  maze.  If  dialogue  systems  do 
not  mimic  this  alignment  process,  users  can  easily  be¬ 
come  confused  if  they  make  a  reference  such  as  “the  two 
o’clock  flight”  but  the  system  talks  about  “BA  112”  in- 
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stead.  In  some  domains  (e.g.,  tutoring,  medical  advising), 
the  process  is  more  complex  as  users  may  use  common 
sense  terms  or  descriptions  instead  of  correct  terminol¬ 
ogy.  The  system  should  not  use  incorrect  terminology, 
but  instead  teach  the  correct  terms.  To  enable  dialogue 
systems  to  align  or  correct  user-referring-expressions,  the 
NLU  sub-system  must  provide  pointers  from  the  seman¬ 
tic  interpretation  back  to  the  words  and  syntax  produced 
by  the  user. 

In  the  body  of  this  paper,  we  present  the  architecture  of 
our  NLU  sub-system.  Features  of  our  architecture  such  as 
a  packed  parse  tree  representation  and  a  two-stage  seman¬ 
tic  interpretation  process  provide  efficiency  and  portabil¬ 
ity  advantages.  However,  they  complicate  the  calculation 
of  confidence  scores  and  maintenance  of  links  between 
the  words  and  syntax  produced  by  the  user  and  the  output 
of  NLU.  The  goal  of  this  paper  is  to  highlight  the  archi¬ 
tectural  trade-offs  involved  in  controlling  fidelity. 

2  Our  Architecture 

Our  NLU  sub-system,  NUBEE,  is  part  of  a  tutorial  dia¬ 
logue  system,  BEETLE  (Basic  Electricity  and  Electron¬ 
ics  Tutorial  Learning  Environment)  (Zinn  et  ak,  forth¬ 
coming).  BEETLE  users  are  given  tasks  to  perform  in 
the  circuit  simulator  pictured  in  figure  1.  Users  manipu¬ 
late  objects  in  this  simulation  as  well  as  conversing  with 
the  system  through  typed  input.  The  typed  input  is  sent 
to  NUBEE  which  queries  the  domain  reasoner,  BEER, 
and  dialogue  history  to  help  build  a  logical  form  which  it 
sends  to  the  dialogue  manager  of  the  system,  the  central 
component  of  BEETLE’S  response  generation. 

NUBEE  uses  an  application-specific  logical  frame¬ 
work  which  closely  resembles  minimal  recursion  seman¬ 
tics  (MRS)  (Copestake  et  ak,  1999).  An  example  logical 
form  is  shown  in  figure  2.  The  identifiers  in  square  brack¬ 
ets  define  the  handles  for  each  of  the  three  elementary 
predications  (EPs).  Handles  are  used  by  one  EP  to  ref¬ 
erence  another  EP.  The  first  EP,  connect",  takes  the  han¬ 
dles  of  two  EPs  as  arguments  (the  handles  for  battery" 
and  wire").  Arguments  can  be  handles  or  atomic  val¬ 
ues.  Note,  we  differentiate  the  definition  of  a  handle  from 
a  handle  reference  by  marking  the  former  with  square 
brackets.  The  two  arguments  to  battery"  are  the  atomic 
values  defNP  and  singular.  To  simplify  processing  we 
simply  pass  on  these  syntactic  features,  dejNP  and  sin¬ 
gular,  for  later  processing  rather  than  defining  quantifiers 
such  as  the. 

In  section  5,  we  describe  our  two  stage  interpretation 
process.  Predicates  output  by  the  first  stage  are  marked 
with  a  prime  (e.g.,  connect')  and  predicates  output  by  the 
second  stage  (such  as  those  in  figure  2)  are  marked  with 
a  double  prime. 

Ideally  each  EP  and  each  of  its  individual  atomic  ele¬ 
ments  would  have  a  confidence  score  (reflecting  the  sub- 


(1)  connect  the  battery  to  the  wire 


(*and* 
(connect' ' 
(battery' ' 
(wire' ' 


[idl]  id2  id3) 

[id2]  defNP  singular) 
[id3]  defNP  singular)) 


Eigure  2:  Example  logical  form 


system’s  confidence  that  it  captures  the  speaker’s  mean¬ 
ing),  and  a  link  back  to  the  syntactic  structures  corre¬ 
sponding  to  the  predicate  or  atomic  element.  Such  a 
fine-grained  representation  would  ensure  that  a  dialogue 
system  could  separate  low  fidelity  elements  from  high  fi¬ 
delity  ones,  and  that  all  the  speaker’s  lexical  and  gram¬ 
matical  choices  were  captured. 

Building  such  a  representation  is  difficult  because  the 
NLU  process  consists  of  a  series  of  tasks:  preprocess¬ 
ing  (in  a  typed  system,  this  consists  of  spelling  correc¬ 
tion  and  unknown  word  handling),  syntactic  and  semantic 
analysis,  and  reference  resolution.  Backward  links  must 
be  built  across  these  processing  steps  and  each  step  in¬ 
troduces  ambiguity  (e.g.,  the  parser  will  output  multiple 
possibilities).  In  section  7,  we  see  that  the  system’s  parser 
uses  a  packed  representation  for  ambiguity.  Such  a  rep¬ 
resentation  is  efficient  but  the  connection  between  indi¬ 
vidual  syntactic  structures  and  semantic  structures  is  lost 
meaning  these  links  must  be  recreated  post-hoc. 

NUBEE’s  architecture  is  shown  in  figure  3.  The 
spelling  correction,  parsing,  and  post-processing  com¬ 
ponents  were  built  using  the  Carmel  workbench  (Rose, 
2000;  Rose  et  ak,  2003).  In  our  architecture  for  typed 
NLU,  speech  recognition  is  replaced  by  unknown  word 
handling  and  spelling  correction  which  we  discuss  in  sec¬ 
tions  3  and  4.  In  these  modules,  it  is  relatively  easy  to 
calculate  confidence  scores  and  record  the  transforma¬ 
tions  made  to  the  user  input.  These  modules  can  make 
dramatic  changes  to  the  user  input  so  it  is  unclear  why 
current  NLU  sub-systems  do  not  track  these  transforma¬ 
tions. 

In  section  5,  we  discuss  the  parsing  and  post¬ 
processing  modules  (parts  of  the  Carmel  workbench).  We 
highlight  why  it  is  difficult  to  assign  confidence  scores 
to  Carmel’s  output  and  maintain  links  between  logical 
predicates  and  the  corresponding  words  typed  by  the  user. 
Section  6  discusses  our  reference  resolution  module  and 
how  it  calculates  confidence  scores  and  records  the  trans¬ 
formations  that  it  makes  (from  logical  predicates  to  sim¬ 
ulated  objects  in  the  domain  reasoner).  In  section  7,  we 
discuss  how  we  calculate  global  confidence  scores  and 
link  references  to  simulated  objects  back  to  the  user’s  re¬ 
ferring  expression. 
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PARTS  BOX 


Continue  witli  Next  Lesson 


■  Lessons  (Curreiii) 


■  Progress  Check  2 


P  Measurement  of  Current 


■  Progress  Check  S 
M  nogress  Check  8 

■  Progress  Check  7 


Previous  Dialogue 

In  the  first  exercise  you  will  get  practice  measuring  current  with  the  multimeter, 
what  is  current? 

Current  is  the  flow  of  electricity  through  a  circuit.  Are  you  ready  to  start? 
and  what  is  a  battery 

Let's  discuss  the  definition  of  a  battery  later.  Are  you  ready  to  start? 
no 


Are 


you  ready  to  start? 


■M 


Welly  let's  start  anyway.  The  first  step  is  to  de-energize  the  circuit.  Go  ahead  and  f 
how  do  i  deenergise? 

You  have  to  open  the  switch.  Try  again  and  de-energize  the  circuit, 
student  removed  wire  1 

If  you  remove  Wire-1  from  the  circuit,  then  this  will  break  the  circuit  but  does  it  pc 
student  deenergized  the  circuit 

Good.  Setting  the  function  switch  is  the  next  task  you  have  to  perform.  I  suggest  you 
The  student  set  the  function  switch  to  cac 

Not  quite,  but  you're  getting  there.  Have  another  try  and  set  the  function  switch  to  I 
The  student  set  the  function  switch  to  vdc 

No,  but  you  are  getting  closer.  Try  again  and  set  the  function  switch  to  C-DC. 


C'wer  it  down. 

wer  the  circuit  down? 
go  ahead  and  set  it  to  C-DC. 

C  ammeter . 


Student 


Your  Turn, 


Figure  1:  NLU-centric  diagram  of  BEETLE 


3  Spelling  Correction 

NUB  EE’s  spelling  corrector  (Elmi  and  Evens,  1998)  and 
robust  parser  are  both  part  of  the  Carmel  workbench  and 
the  interface  between  the  two  is  predefined.  The  spelling 
corrector  uses  the  parser’s  lexicon  as  its  dictionary  and  at¬ 
tempts  to  fix  spelling  and  typing  errors  in  known  words. 
Since  the  parser’s  lexicon  is  typically  much  smaller  than 
a  lexical  database  such  as  WordNet  (Miller,  1990),  there 
is  a  reduction  in  token  ambiguity  (i.e.,  the  number  of  pos¬ 
sible  replacements  to  consider)  but  the  spelling  of  un¬ 
known  words  will  not  be  corrected.  The  simplification 
is  also  made  that  known  words  are  never  misspelled  ver¬ 
sions  of  other  known  words  (e.g.,  typing  “their”  instead 
of  “there”). 

The  spelling  corrector  uses  string  transformations  to 
attempt  to  repair  spelling/typing  errors.  Because  repeated 


transformations  will  map  any  input  string  to  a  word  in 
the  dictionary,  transformations  are  given  penalty  scores 
and  a  threshold  defines  an  allowable  spelling  correction. 
However,  the  spelling  corrector’s  decisions  are  final;  re¬ 
placements  whose  penalty  scores  are  below  the  threshold 
are  entered  directly  into  the  parser’s  chart  but  the  penalty 
scores  are  not  passed  on. 

To  produce  a  record  of  spelling  corrector  transforma¬ 
tions,  we  create  a  word  transformation  table  for  every 
new  utterance.  Each  transformation  is  recorded  in  a  ta¬ 
ble  entry  consisting  of  the  original  word,  the  transformed 
word  (after  spelling  correction),  and  an  associated  confi¬ 
dence  score.  We  have  modified  the  spelling  corrector  to 
return  a  confidence  score  based  on  the  number  of  alterna¬ 
tives  that  it  proposes.  We  show  below  the  transformation 
table  recording  the  spelling  corrector’s  output  for  the  mis¬ 
spelled  word  “socet”: 


Figure  3:  NUBEE  architecture 


socet  ->  socket,  0.5 
->  set,  0.5 

In  future  work,  we  plan  to  modify  the  spelling  correc¬ 
tor  so  that  it  outputs  its  penalty  score. 

4  Unknown  Word  Handling 

Carmel’s  robust  parser  can  skip  words  while  attempt¬ 
ing  to  find  an  analysis  for  the  user  input.  True  un¬ 
known  words  (not  spelling  errors)  will  be  skipped  be¬ 
cause  Carmel  will  have  no  information  about  their  syn¬ 
tactic  or  semantic  features.  Eor  some  unknown  words, 
this  is  the  best  solution  because  they  represent  concepts 
not  having  an  obvious  link  to  knowledge  modeled  by  the 
system  (users  should  be  alerted  to  the  system’s  limita¬ 
tion).  However,  we  are  aware  of  no  work  on  attempting 
to  recognize  novel  ways  of  referring  to  entities  modeled 
by  the  system. 

Our  approach  is  to  find  known  synonyms  (i.e.,  in 
NUBEE’s  lexicon)  of  unknown  words  using  the  Word- 
Net  lexical  database  (Miller,  1990).  In  WordNet,  words 
are  connected  to  one  or  more  synsets  each  corresponding 
to  a  distinct  concept.  Each  synset  will  have  a  set  of  hy- 
ponymy  (all  the  subtypes  of  the  synset)  and  hypernymy 
(the  supertype  of  the  synset)  links. 

Currently,  we  use  a  very  simple  search  process  to 
look  for  a  known  word  whose  meaning  approximates  the 
meaning  of  the  unknown  word.  We  assign  a  part-of- 
speech  (POS)  tag  to  the  unknown  word,  and  search  the 
appropriate  WordNet  taxonomy.  We  retrieve  the  synsets 
associated  with  the  word  and  run  the  search  procedure 
SEARCH-WORD  stopping  when  a  known  word  is  found. 

procedure  SEARCH-WORD  (SYNSETS) 

1.  SEARCH-DOWN  (SYNSETS) 

2.  if  height  threshold  not  reached  then 

SEARCH-WORD  (hypernyms  for  SYNSETS) 

procedure  SEARCH-DOWN  (SYNSETS) 

1.  search  all  words  having  a 

synset  in  SYNSETS 

2.  SEARCH-DOWN  (all  hyponyms  of  SYNSETS) 

Nodes  in  the  WordNet  taxonomy  close  to  its  root 
have  relatively  general  meanings  (e.g.,  social-relation, 


physical-object)  so  we  define  a  limit  (height  threshold) 
to  how  far  the  search  can  progress  up  the  taxonomy. 

To  make  a  record  of  unknown-word-handling  transfor¬ 
mations,  we  add  additional  entries  to  the  word  transfor¬ 
mation  table  output  by  spelling  correction.  We  use  the 
size  of  the  search  space  to  calculate  confidence  scores, 
treating  the  set  of  replacement  words  retrieved  in  each 
step  of  the  search  process  as  equally  likely.  Consider 
the  example  of  the  unknown  word  “cable”.  Step  one 
of  SEARCH-DOWN  returns:  “telegraph”,  “wire”,  and 
“fasten-with-a-cable”.  “wire”  is  a  known  word,  and  “ca¬ 
ble”  is  replaced  by  “wire”  with  a  confidence  of  0.33: 
cable  ->  wire,  0.33 

5  Carmel  Workbench 

We  use  the  Carmel  workbench  (Rose,  2000;  Rose  et  al., 
2003)  for  parsing  and  post-processing.  In  Carmel’s  AU- 
TOSEM  framework: 

“semantic  interpretation  [operates]  in  parallel 
with  syntactic  interpretation  at  parse  time  in  a 
lexicon  driven  fashion.  ...  [Semantic]  knowl¬ 
edge  is  encoded  declaratively  within  a  mean¬ 
ing  representation  specification.  Semantic  con¬ 
structor  functions  are  compiled  automatically 
from  this  specification  and  then  linked  into  lex¬ 
ical  entries”  (Rose,  2000,  p.  311). 

Carmel  comes  with  a  wide-coverage  English  grammar 
that  is  compatible  with  the  wide-coverage  COMDEX  lex¬ 
icon  (Grishman  et  al.,  1994).  Eor  each  COMDEX  entry 
that  we  wanted  to  add  into  NUBEE’s  lexicon,  we  speci¬ 
fied  its  meaning  as  shown  below  for  the  words  “connect”, 
“battery”,  and  “wire”. 

connect:  connect',  sub ject->agent, 
ob  ject->theine, 
modlf ler->destlnatlon 
battery:  battery' 
wire :  wire ' 

This  simplified  example  of  the  meaning  specification 
assigns  a  predicate  to  each  word,  and  in  the  case  of  a 
verb  such  as  “connect”  assigns  a  mapping  from  the  syn¬ 
tactic  roles  of  subject,  object,  and  modifier  to  the  seman¬ 
tic  roles  of  agent,  theme,  and  destination.  This  repre¬ 
sentation  is  domain-independent  and  reusable;  it  will  al¬ 
ways  be  the  case  that  the  subject  of  “connect”  realizes  the 


(2a)  (2c) 

connect  the  battery  to  the  cable  (wire) 

(*and* 

(frame  connect') 

(theme  (*and*  (*and*  (frame  battery') 

(number  singular) ) 
(def  defNP) ) ) 

(destination  (*and* 

(*and*  (frame  wire') 

(number  singular) ) 


( *or* 


(connect''  [id4]  " 

" I  I  I  WIRE-1461") 
(connect' '  [idS]  " 
" I  I  I  WIRE-1441") 
(connect' '  [id6]  " 

" I  I  I  WIRE-1451") 
(connect''  [id7]  " 

"III REDLEADl") 
(connect' '  [id8]  " 

"III BLACKLEADl") 


I  I  I BATTl" 

I  I  I BATTl" 

I  I  I BATTl" 

I  I  I BATTl" 

I  I  I BATTl" 

) 


(def  defNP) ) ) ) 

Figure  4:  Example  semantic  feature  value 


Figure  6:  Example  logical  form  after  reference  resolution 


(2b) 

(*and* 

(connect''  [idl]  id2  id3) 

(battery''  [id2]  defNP  singular) 
(wire''  [id3]  defNP  singular)) 

Figure  5:  Example  logical  form  from  section  1 

agent,  the  object  realizes  the  theme,  and  that  the  destina¬ 
tion  (if  present)  will  be  realized  as  a  modifier. 

Figure  4  shows  a  simplified  version  of  the  parser’s  out¬ 
put  given  the  utterance,  “connect  the  battery  to  the  ca¬ 
ble”.  Recall  from  section  4  that  the  unknown  word  han¬ 
dler  will  replace  “cable”  with  “wire”  leading  to  the  wire' 
predicates  in  figure  4. 

The  two  occurrences  of  the  definite  article  “the”  trig¬ 
ger  the  feature  values  (def  defNP),  marking  that  battery' 
and  wire'  occurred  in  definite  NPs.  The  nouns,  “battery” 
and  “wire”,  have  the  associated  syntactic  feature  of  being 
singular,  and  the  features  theme  and  destination  mark  the 
semantic  roles  of  battery'  and  wire' . 

Dialogue  systems  use  domain-specific  reasoners  to 
process  the  output  of  NLU  sub-systems  (e.g.,  to  answer  a 
user  question  or  execute  a  user  command,  or  to  judge  the 
correctness  of  a  student  answer).  Such  domain  reasoners 
generally  expect  input  in  a  predefined,  domain-specific 
format  necessitating  a  second  stage  of  processing  to  con¬ 
vert  the  parser’s  output  into  the  correct  format. 

Our  domain  reasoner’s  representation  for  the  connect 
action  is  a  predicate,  connect",  taking  as  arguments,  the 
two  objects  to  be  connected.  Carmel  provides  support  for 
building  such  predicates  from  the  parser’s  output  based 
on  a  declarative  specification.  Based  on  our  specification 
for  connect",  Carmel’s  predicate  mapper  will  produce  the 
logical  form  shown  in  figure  5  (a  copy  of  figure  2). 

During  the  parsing  and  post-processing  stages,  the 
string  returned  from  pre-processing  (spelling  correction 
and  unknown  word  handling)  is  transformed  into  a  series 
of  predicates.  We  currently  do  not  keep  track  of  all  the 
connections  between  the  predicates  and  the  words  that 
formed  them.  The  predicate  mapping  stage  is  difficult  to 


unravel;  the  mapping  rules  operate  on  semantic  feature 
values  (such  as  those  shown  in  figure  4).  There  is  no  di¬ 
rect  link  between  pieces  of  semantic  feature  values  and 
the  words  that  triggered  them  so  it  is  difficult  to  associate 
the  output  of  predicate  mapping  with  words.  See  section 
7  for  more  details  and  our  interim  solution. 

6  Reference  Resolution 

For  each  predicate  corresponding  to  a  physical  entity, 
the  reference  resolution  module  must  decide  whether  the 
predicate  refers  to:  a  concept  (a  generic  reading),  a  novel 
object  (indefinite  reference),  or  an  existing  object  (defi¬ 
nite  reference).  If  the  predicate  refers  to  an  existing  ob¬ 
ject,  the  predicate  (e.g.,  wire")  may  match  several  objects 
in  the  domain  reasoner  but  the  speaker  may  only  be  refer¬ 
ring  to  a  subset  of  these  objects. 

Figure  6  shows  the  example  from  figure  5  after  refer¬ 
ence  resolution.  The  predicate  battery"  is  replaced  by 
the  name  of  the  one  battery  present  in  figure  1,  but  wire" 
could  refer  to  any  of  the  five  wires  in  the  circuit  leading 
to  the  ambiguity  depicted. 

NUBEE  can  query  the  dialogue  system’s  history  list 
to  assign  salience  and  calculate  confidence  scores  for 
the  transformations  it  makes.  These  transformations  are 
stored  in  a  table  such  as  the  one  below  (assume  that 
"  I  I  I  WI  RE  - 1 4  6 1 "  had  been  mentioned  previously  but 
the  other  wires  had  not): 

(battery''  [id2]  def  singular)  -> 

"UIBATTl"  (1.0) 

(wire''  [id3]  def  singular)  -> 

" I  I  I  WIRE-1461"  (0.6)  " I  I  I  WIRE-1441"  (0.1) 
" I  I  I  WIRE-1451"  (0.1)  " I  I  I REDLEADl"  (0.1) 

" I  I  I BLACKLEADl"  (0.1) 

In  the  next  section,  we  will  see  how  these  table  entries 
are  matched  with  words  from  the  input. 

7  Improving  Fidelity 

We  are  currently  focusing  on  improving  fidelity  for  re¬ 
ferring  expressions:  assigning  confidence  scores  to  the 
objects  retrieved  during  reference  resolution  and  linking 


referenced  objects  to  the  words  used  to  refer  to  them.  We 
make  the  assumption  that  all  referring  expressions  are 
NPs  and  build  a  table  of  predicates  (formed  from  NPs) 
and  the  words  associated  with  those  NPs.  A  sample  entry 
is  shown  below. 

(wire''  [id2]  def  singular) 

<=  ' 'the  wire' ' 

We  run  the  following  procedure  on  each  NP  in  the  one 
parse  tree  node  covering  the  input  (in  the  parser’s  packed 
representation  there  will  only  be  one  such  node  with  the 
category  utterance). 

procedure  PROCESS-NP  (NP) 

1.  run  the  predicate  mapper  just  on 

NP  to  get  its  associated  predicate 

2.  follow  the  children  of  NP  to  find 

the  words  associated  with  it 

The  next  step  is  consulting  the  spelling  corrector  and 
unknown  word  handler.  In  section  3,  we  introduced  a 
table  of  word  substitutions  with  associated  confidence 
scores.  This  table  can  be  used  to  replace  the  words  found 
in  the  chart  with  the  words  actually  typed  by  the  user  and 
to  compute  a  global  confidence  score  by  multiplying  the 
confidence  scores  of  the  individual  words  with  the  confi¬ 
dence  score  assigned  during  reference  resolution. 

In  section  4,  we  discussed  unknown  word  handling  for 
“the  cable”;  combining  this  result  with  the  reference  table 
computed  above  gives  us: 

"HI  WIRE-1461", 

(wire''  [id3]  def  singular), 

0.198,  '  'the  cable' ' 

"HI  WIRE-1441", 

(wire''  [id3]  def  singular), 

0 . 033,  '  'the  cable' ' 

"HI  WIRE-1451", 

(wire''  [id3]  def  singular), 

0 . 033,  '  'the  cable' ' 

"HI  REDLEADl", 

(wire''  [id3]  def  singular), 

0 . 033,  '  'the  cable' ' 

"HI  BLACKLEADl", 

(wire''  [id3]  def  singular), 

0.033,  ''the  cable'' 

One  complication  is  that  NPs  can  be  associated  with 
an  ambiguous  set  of  words.  Consider  a  nonsense  word 
in  our  domain,  “waters”.  The  spelling  corrector  will  pro¬ 
pose  “watts”  and  “meters”  as  possible  replacements.  In 
the  parser’s  packed  representation,  the  nouns  “watts”  and 
“meters”  share  the  same  node.  A  disjunctive  set  of  fea¬ 
tures  represents  the  ambiguity. 


(*or* 

(*and*  (frame  watts') 

(number  plural) 

(root  watt) ) 

(*and*  (frame  meters') 

(number  plural) 

(root  meter) ) ) 

This  ambiguity  is  propagated  to  the  NP  node  but  the 
connection  between  the  word  stems  (the  root  feature)  and 
the  two  NP  meanings  is  lost. 

(*or* 

(*and*  (frame  watts') 

(number  plural) 

(def  indef ) ) 

(*and*  (frame  meters') 

(number  plural) 

(def  indef) ) ) 

We  modified  PROCESS-NP  to  deal  with  such  cases  by 
adding  an  additional  step: 

3.  for  each  meaning,  M  of  the  NP 
3.1  try  to  match  M  with  one  or 

more  of  the  words  associated 
with  the  NP  (i.e.,  run  the 
predicate  mapper  just  on 
the  word  and  see  if  it  matches 
one  of  the  meanings  of  the  NP) 

8  Discussion 

Although  there  has  been  work  on  controlling  the  fidelity 
of  individual  components  of  the  pipeline  shown  in  figure 
3,  there  has  been  little  work  considering  the  NLU  sub¬ 
system  as  a  whole.  Gabsdil  and  Bos  (2003)  incorporate 
(speech  recognizer)  confidence  scores  into  their  logical 
form  for  elements  that  correspond  directly  to  words  in  the 
input  (rather  than  larger  structures  built  through  compo¬ 
sition).  Consider  the  example  of  the  word  “manager”  and 
assume  it  has  a  speech  recognition  confidence  score  of 
0.65.  Gabsdil  and  Bos’  parser  will  assign  “manager”  the 
semantic  value  of  lj:MANAGER(vi)  where  Ij  is  a  handle 
and  Vi  a  variable.  This  semantic  value  is  given  a  con¬ 
fidence  score  of  0.65  the  same  as  “manager”.  To  com¬ 
pute  confidence  scores  for  larger  constituents  they  sug¬ 
gest  to  “combine  confidence  scores  for  sub-formulas  re¬ 
cursively”  (Gabsdil  and  Bos,  2003,  p.  149). 

We  have  taken  this  idea  further  and  explored  the  is¬ 
sues  involved  in  computing  confidence  scores  for  larger 
constituents.  Some  of  these  issues  are  linked  to  our  two- 
stage  semantic  analysis.  However,  Carmel’s  two-stage 
interpretation  process  {i.e.,  a  domain-independent  parsing 
stage  and  a  domain-dependent  predicate  mapping  stage) 
is  not  idiosyncratic  to  the  Carmel  workbench.  Dzikovska 
et  al.  (2002)  adopt  such  a  two  stage  approach  because 


their  NLU  sub-system  is  used  in  multiple  domains  (e.g., 
transportation  planning,  medication  advice)  necessitating 
reuse  of  resources  wherever  possible.  Milward  (2000) 
uses  a  two  stage  approach  because  it  increases  robust¬ 
ness.  When  the  parser  is  not  able  to  build  a  parse  tree 
covering  the  entire  input,  there  will  still  be  a  semantic 
chart  composed  of  partial  parses  and  their  associated  se¬ 
mantic  feature  values.  For  the  domain  of  airline  flight 
information,  Milward  defines  post-processing  rules  that 
scan  this  semantic  chart  looking  for  information  such  as 
departure  times.  Our  goal  in  this  paper  was  to  highlight 
the  architectural  trade-offs  of  such  features  on  controlling 
fidelity. 

9  Future  Work 

Our  search  process  for  unknown  word  handling  is  rudi¬ 
mentary.  Each  step  of  the  search  procedures  described 
above  returns  a  set  of  replacement  candidates  which  are 
treated  as  equally  likely.  In  future  work,  we  plan  to  re¬ 
vise  this  search  process  to  use  a  distance  metric  such  as 
one  of  those  discussed  in  (Budanitsky  and  Hirst,  2001). 
Such  distance  metrics  take  into  account  factors  such  as 
the  overall  depth  of  the  WordNet  taxonomy  and  the  fre¬ 
quency  of  synsets  in  a  corpus,  and  will  allow  us  to  better 
control  the  search  process. 

Although  we  know  of  no  work  on  using  WordNet 
to  handle  unknown  words  during  interpretation,  there  is 
work  on  using  WordNet  for  lexical  variation  during  gen¬ 
eration.  Jing  (1998)  presents  an  algorithm  for  converting 
WordNet  into  a  domain-specific  taxonomy  of  replaceable 
words.  First,  words  and  synsets  are  removed  that  do  not 
appear  in  a  corpus  representative  of  the  domain. 

The  senses  of  verb  arguments  in  the  corpus  are  disam¬ 
biguated  based  on  the  intuition  that  words  appearing  as 
the  same  argument  to  the  same  verb  should  have  senses 
close  to  each  other  in  WordNet.  Consider  an  example 
from  Jing’s  domain  of  generating  basketball  news  re¬ 
ports.  The  verb  “add”  takes  words  such  as  “rebound”, 
“throw”,  and  “shot”  as  objects.  Jing  states  that  “re¬ 
bound”  and  “throw”  have  senses  that  are  members  of  the 
synset  accomplishment-achievement;  their  other  senses 
do  not  align  in  this  manner  and  are  pruned  from  WordNet. 
“shot”  has  a  sense  that  is  a  hyponym  of  accomplishment- 
achievement  forming  a  small  taxonomy.  This  process 
can  be  used  in  reverse  for  words  not  in  WordNet  such 
as  “layup”,  “layup”  often  occurs  as  an  object  to  “hit”  as 
do  the  known  words  “jumper”  and  “baskets”.  Based  on 
this  information,  “layup”  is  added  to  WordNet  under  the 
synset  accomplishment-achievement,  the  synset  Jing  as¬ 
signs  to  “jumper”  and  “baskets”. 

In  future  work,  we  will  use  such  a  pruning  algorithm 
on  WordNet,  and  in  addition  to  such  static  pruning  (done 
off-line),  we  want  to  try  dynamic  pruning;  e.g.,  to  process 
the  unknown  word  “X”,  in  the  example,  “connect  the  X 


to  tab  4”,  we  should  only  consider  “connectible”  entities 
as  defined  by  the  system’s  ontology. 

Another  area  for  future  work  are  the  parsing  and  post¬ 
processing  steps  (section  5);  these  steps  do  not  maintain 
confidence  scores  nor  input-output  links  making  it  dif¬ 
ficult  to  compute  global  confidence  scores  and  maintain 
links  between  the  words  and  syntax  produced  by  a  user 
and  the  resulting  output  from  NUBEE. 

The  work  discussed  in  section  7  focuses  on  referring 
expressions  realized  as  NPs.  An  incremental  approach  to 
improving  this  work  would  involve  supporting  other  syn¬ 
tactic  categories  such  as  verbs,  and  modifying  NUBEE’s 
confidence  scores  to  penalize  transformations  done  dur¬ 
ing  parsing  and  post-processing  {i.e.,  skipping  words,  use 
of  disambiguation  heuristics). 

A  more  general  approach  would  require  changes  to 
the  Carmel  workbench.  Currently  the  packed  represen¬ 
tation  represents  ambiguity  as  a  disjunction  of  semantic 
feature  values  {e.g.,  figure  4).  Each  of  these  feature  val¬ 
ues  could  have  an  associated  confidence  score  and  a  list 
of  the  words  associated  with  those  semantic  feature  val¬ 
ues.  With  such  a  representation,  the  post-hoc  analysis 
discussed  in  section  7  would  not  be  necessary.  We  could 
compute  confidence  scores  as  follows.  Typically  in  sta¬ 
tistical  parsing,  the  probability  of  a  parse-tree  node  is  the 
product  of  the  probability  of  the  rule  forming  the  node, 
and  the  probabilities  of  its  children.  We  could  propa¬ 
gate  confidence  scores  for  individual  words  in  the  same 
fashion  (making  the  assumption  that  all  rules  are  equally 
likely). 

However,  the  added  complexity  and  associated  in¬ 
creased  processing  load  may  not  be  worth  the  ability 
to  associate  each  element  of  NUBEE’s  output  with  a 
confidence  score  and  syntactic  information.  We  plan 
to  perform  a  detailed  evaluation  to  investigate  the  ef¬ 
fect  of  tracking  fidelity  on  the  generation  of  clarification 
and  confirmation  requests,  and  alignment  of  the  dialogue 
system  with  the  user  (and  where  necessary,  correction). 
Evaluation  will  take  two  forms:  building  a  test  suite  from 
human-human  dialogues  in  this  domain,  and  analysis  of 
user  interactions  with  the  system. 
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